Method and apparatus for modeling mass spectrometer lineshapes

ABSTRACT

Methods and apparatuses are disclosed that model the lineshapes of mass spectrometry data. Ions can be modeled with an initial distribution that models molecules as having multiple positions and/or energies prior to traveling in the mass spectrometer. These initial distributions can be pushed forward by time of flight functions. Fitting can be performed between the modeled lineshapes and empirical data. Filtering can greatly reduce dimensions of the empirical data, remove noise, compress the data, recover lost and/or damaged data.

BACKGROUND OF THE INVENTION

Mass spectrometry can be applied to the search for significantsignatures that characterize and diagnose diseases. These signatures canbe useful for the clinical management of disease and/or the drugdevelopment process for novel therapeutics. Some areas of clinicalmanagement include detection, diagnosis and prognosis. More accuratediagnostics may be capable of detecting diseases at earlier stages.

A mass spectrometer can histogram a number of particles by mass.Time-of-flight mass spectrometers, which can include an ionizationsource, a mass analyzer, and a detector, can histogram ion gases bymass-to-charge ratio. Time-of-flight instruments typically put the gasthrough a uniform electric field for a fixed distance. Regardless ofmass or charge all molecules of the gas pick up the same kinetic energy.The gas floats through an electric-field-free region of a fixed length.Since lighter masses have higher velocities than heavier masses giventhe same kinetic energy, a good separation of the time of arrival of thedifferent masses will be observed. A histogram can be prepared for thetime-of-flight of particles in the field free region, determined bymass-to-charge ratio.

Mass spectrometry with and without separations of serum samples produceslarge datasets. Analysis of these data sets can lead to biostateprofiles, which are informative and accurate descriptions of biologicalstate, and can be useful for clinical decisionmaking. Large biologicaldatasets usually contain noise as well as many irrelevant datadimensions that may lead to the discovery of poor patterns.

When analyzing a complex mixture, such as serum, that probably containsmany thousands of proteins, the resulting spectral peaks show perhaps amere hundred proteins. Also, with a large number of molecular speciesand a mass spectrometer with a finite resolution, the signal peaks fromdifferent molecular species can overlap. Overlapping signal peaks makedifferent molecular species harder to differentiate, or evenindistinguishable. Typical mass spectrometers can measure approximately5% of the ionized protein molecules in a sample.

Performing analysis on raw data can be problematic, leading tounprincipled analysis of both data points and peaks. Raw data analysiscan treat each data point as an independent entity. However, theintensity at a data point may be due to overlapping peaks from severalmolecular species. Adjacent data points can have correlated intensities,rather than independent intensities. Ad hoc peak picking involvesidentifying peaks in a spectrum of raw data and collapsing each peakinto a single data point.

Mass spectra of simple mixtures, such as some purified proteins, can beresolved relatively easily, and peak heights in such spectra can containsufficient information to analyze the abundance of species detected bythe mass spectrometer (which is proportional to the concentration of thespecies in the gas-phase ion mixture). However, the mass spectra of seraor other complex mixtures can be more problematic. A complex mixture cancontain many species within a small mass-to-charge window. The intensityvalue at any given data point may have contributions from a number ofoverlapping peaks from different species. Overlapping peaks can causedifficulties with accurate mass measurements, and can hide differencesin mass spectra from one sample to the next. Accurate modeling of thelineshapes, or shapes of the peaks, can enhance the reliability andaccurate analysis of mass spectra of complex biological mixtures.Lineshape models, or models of the peaks can also be called modeledmass-to-charge distributions.

Signal processing can aid the discovery of significant patterns from thelarge volume of datasets produced by separations-mass spectrometry. Massspectral signal processing can address the resolution problem inherentin mass spectra of complex mixtures. Pattern discovery can be enhancedfrom signal processing techniques that remove noise, remove irrelevantinformation and/or reduce variance. In one application, these methodscan discover preliminary biostate profiles from proteomics or otherstudies.

Therefore, it is desirable to reduce the noise and/or dimensionality ofdatasets, improve the sensitivity of mass spectrometry, and/or processthe raw data generated by mass spectrometry to improve tasks such aspattern recognition.

BRIEF SUMMARY OF THE INVENTION

In some embodiments, molecules can be represented with a modeledmass-to-charge distribution detected by a mass spectrometer. The modeledmass-to-charge distribution can be based on a modeled initialdistribution representing the molecules prior to traveling in the massspectrometer. The modeled initial distribution can represent themolecules as having multiple positions and/or multiple energies and/orother initial parameters including ionization, position focusing,extraction source shape, fringe effects of electric fields, and/orelectronic hardware artifacts. The modeled mass-to-charge distributionof the molecules and an empirical mass-to-charge distribution of themolecules can be compared.

In some embodiments, molecules can be represented by an analyticexpression of a modeled mass-to-charge distribution detected by a massspectrometer. The modeled mass-to-charge distribution can be based on amodeled initial distribution representing molecules prior to travelingin the mass spectrometer. The modeled initial distribution can representthe molecules as having multiple positions and/or multiple energiesand/or other initial parameters including ionization, position focusing,extraction source shape, fringe effects of electric fields, and/orelectronic hardware artifacts.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a flowchart illustrating one embodiment of performing signalprocessing on a mass spectrum.

FIG. 2 is a flowchart illustrating aspects of some embodiments ofperforming signal processing on a mass spectrum.

FIG. 3 is a simple schematic of a time-of-flight mass spectrometer.

FIG. 4 is a simple schematic of a time-of-flight mass spectrometer witha reflectron.

FIG. 5 illustrates a probability density function of a pushed forwardGaussian, showing a skew to the right.

FIG. 6 shows a change of coordinates from (x, z) to (v, θ)

FIG. 7 shows a mass spectrum.

FIG. 8 shows an expanded view of FIG. 7.

DETAILED DESCRIPTION OF THE INVENTION

The number of samples can be quite small relative to the number of datadimensions. For example, disease studies can include, in one case, onthe order of 10² patients and 10⁹ data dimensions per sample.

To lessen the computational burden of pattern recognition algorithms andimprove estimation of the significance of a given pattern better,dimensionality reduction can be performed on the mass spectrometry data.Signal processing can ensure that processed data contains as littlenoise and irrelevant information as possible. This increases thelikelihood that the biostate profiles discovered by the patternrecognition algorithms are statistically significant and are notobtained purely by chance.

Dimensionality reduction techniques can reduce the scope of the problem.An important tool of dimensionality reduction is the analysis oflineshapes, which are the shapes of peaks in a mass spectrum.

Lineshapes, instead of individual data points, can be interpreted in aphysically meaningful way. The physics of the mass spectrometer can beused to derive mathematical models of mass spectrometry lineshapes. Ionstraveling through mass spectrometers have well-defined statisticalbehavior, which can be modeled with probability distributions thatdescribe lineshapes. The modeled lineshapes can represent thedistribution of the time-of-flight for a given mass/charge (m/z), givenfactors such as the initial conditions of the ions and instrumentconfigurations.

For specific mass spectrometer configurations, equations are derived forthe flight time of an ion given its initial velocity and position. Next,a probability distribution is assumed of initial positions and/orvelocities and/or other initial parameters that affect thetime-of-flight based on rigorous statistical mechanical approximationtechniques and/or distributions such as gaussians. Formulae are thencalculated for the time-of-flight probability distributions that resultfrom the probability-theoretical technique of “pushing forward” theinitial position and/or velocity distributions by the time-of-flightequations. Each formula obtained can describe the lineshape for amass-to-charge species.

A complex spectrum can be modeled as a mixture of such lineshapes. Usingthe modeled lineshapes, real spectrometric raw data of an observed massspectrum can be deconvolved into a more informative description. Themodeled lineshapes can be fitted to spectra, and/or residual errorminimization techniques can be used, such as optimization algorithmswith L2 and/or L1 penalties. Coefficients can be obtained that describethe components of the deconvolved spectrum.

Thus, data dimensions that describe a given peak can be collapsed into asimpler record that gives, for example, the center of the peak and thetotal intensity of the peak. In some cases, a broad peak in a spectrumcan be replaced with much less data, which can be several m/z datapoints or a single m/z data point that represents the observedcomponent's abundance in the spectrometer, which in turn is correlatedwith the abundance of the observed component in the original sample.

Filtering techniques (e.g., hard thresholding, soft thresholding and/ornonlinear thresholding) can be performed to de-noise and/or compressdata. The processed data, with noise removed and/or having reduceddimensionality, can be one or more orders of magnitude smaller than theoriginal raw dataset. Thus, the original raw dataset can be decomposedinto chemically meaningful elements, despite the artifacts andbroadening introduced by the mass spectrometer. Even in instances wherepeaks overlap such that they are visually indiscernible, this method canbe applied to decompose the spectrum. The processed data may be roughlyphysically interpretable and can be much better suited for patternrecognition, due to the significantly less noise, fewer data dimensions,and/or more meaningful representation of charged states, isotopes ofparticular proteins, and/or chemical elements, that relate to theabundance of different molecular species.

When applied to processed data, such pattern recognition methodsidentify proteins which may be indicative of disease, and/or aid in thediagnosis of disease in people and quantify their significance. Findingthe proteins and/or making a disease diagnosis can be based at leastpartly on the modeled mass-to-charge distribution.

FIG. 1 is a flowchart illustrating one embodiment of performing signalprocessing on a mass spectrum. In 110, a modeled mass-to-chargedistribution represents molecules that have traveled through a massspectrometer. The modeled mass-to-charge distribution is based on atleast a modeled initial distribution of any parameter affectingtime-of-flight representing the molecules prior to traveling in the massspectrometer. In 120, the modeled mass-to-charge distribution iscompared with an empirical mass-to-charge distribution. Variousembodiments can add, delete, combine, rearrange, and/or modify parts ofthis flowchart.

FIG. 2 is a flowchart illustrating aspects of some embodiments ofperforming signal processing on a mass spectrum. In 210, a modeledinitial distribution of one or more parameters affecting time-of-flightrepresents molecules prior to traveling in the mass spectrometer. In220, the modeled initial distribution is pushed forward by time offlight functions. The modeled distribution is thereby based at leastpartly on the modeled initial distribution. In 230, a mass spectrometerdetects an empirical distribution of molecules. This empiricaldistribution and the modeled distribution can be compared. In 240, a fitis performed between the empirical and modeled distributions. In 250,the fit is filtered. Various embodiments can add, delete, combine,rearrange, and/or modify parts of this flowchart.

Simple Mass Spectrometer Analyzer Configuration

FIG. 3 illustrates a simple schematic of a time-of-flight massspectrometer. In a simple case, the mass analyzer has two chambers: theextraction region 310 and the drift region 320 (also called thefield-free region), at the end of which is the detector 330. The flightaxis 340 extends from the extraction chamber to the detector. Oneexample of the effect of location in the extraction region on thetime-of-flight of an ion is illustrated. Ion 360 is closer to the backof the extraction chamber than ion 370. Ion 360 is accelerated for alonger time in the extraction region 310 than ion 370. Ion 360 exits theextraction region 310 with a higher velocity than ion 370. Thus ion 360reaches the detector 330 before ion 370.

FIG. 4 illustrates a simple schematic of a time-of-flight massspectrometer with a reflectron. In addition to the extraction region410, the drift region 420, and the detector 430, a reflectron 440 helpsto lengthen the drift region 420 and focus the ions.

In some embodiments, the full gas content is completely localized in theextraction chamber with negligible kinetic energy in the direction ofthe flight axis. Other embodiments permit the gas tohave some kineticenergy in the direction of the flight axis, and/or have some kineticenergy away from the direction of the flight axis. In anotherembodiment, the gas ions have an initial spatial distribution within theextraction source. In yet another embodiment, the gas ions have aninitial spatial distribution within the extraction source and have somekinetic energy in the direction of the flight axis, and/or have somekinetic energy away from the direction of the flight axis.

In an ideal case, an extraction chamber has a potentially pulsed uniformelectric field E₀ in the direction of the flight axis, and has lengths₀. An ion of mass m and charge q that starts at the back of theextraction chamber will pick up kinetic energy E₀s₀q while travelingthrough the electric field. Suppose the field-free region has length D.If the ion has constant energy while in the field-free region, then:

$\begin{matrix}{{\frac{1}{2}{mv}^{2}} = {E_{0}s_{0}q}} & (1)\end{matrix}$

Other embodiments model an extraction chamber with a uniform electricfield in a direction other than the flight axis, and/or an electricfield that is at least partly nonuniform and/or at least partly timedependent.

If t_(D) is the time-of-flight in the field-free region, and ν=D/t_(D)then:

$\begin{matrix}{t_{D} = {D\sqrt{\frac{m}{2\; E_{0}s_{0}q}}}} & (2)\end{matrix}$

If not only the time-of-flight in the drift-free region is of interest,but the time spent in the extraction region as well, the velocity can bea function of distance traveled (from the energy gained). If u is thedistance traveled, then

${v(u)} = {\sqrt{\frac{2\; E_{0}{uq}}{m}}.}$

Both sides of dt=du/ν(u) are integrated:

$t_{ext} = {{\int_{0}^{s_{0}}{\sqrt{\frac{m}{2\; E_{0}{uq}}}{\mathbb{d}u}}} = {{\sqrt{\frac{m}{2\; E_{0}s_{0}q}} \cdot 2}\;{s_{0}.}}}$

So the total time-of-flight is t_(tot)=t_(ext)+t_(D):

$\begin{matrix}{t_{tot} = {\left( {D + {2s_{0}}} \right)\sqrt{\frac{m}{2\; E_{0}s_{0}q}}}} & (3)\end{matrix}$

Analogous equations can be derived to represent the ions as they movethrough other regions of a mass spectrometer.

With real world conditions, errors in the mass spectrum histogram can beseen, and the time-of-flight of a given species of mass-to-charge canhave a distribution with large variance. This can be measured by widthsat half-maximum height of peaks that are observed, to generateresolution statistics. The resolution of a given mass-to-charge is m/δm(where m represents mass-to-charge m/q of equation (3) and where “δm”refers to the width at the half-maximum height of the peak).

Some factors that affect the time-of-flight distributions of a givenmass-to-charge species are the initial spatial distribution within theextraction chamber, and the initial kinetic energy (alternatively,initial velocity) distribution in the flight-axis direction, and/orother initial parameters including ionization, position focusing,extraction source shape, fringe effects of electric fields, and/orelectronic hardware artifacts. Other embodiments can represent theinitial kinetic energy (alternatively initial velocity) distribution ina direction other than the flight-axis direction.

Choosing Initial Distributions of Species

The initial distributions of parameters of an ion species that affectthe time-of-flight pushed forward by the time of flight functions can becalled modeled initial distributions.

Some embodiments use distributions such as gaussian distributions ofinitial positions and/or energies (alternatively velocities).

Other embodiments can use various parametric distributions of initialpositions and/or energies. The parameters can result from data fittingand/or by scientific heuristics. Further embodiments rely on statisticalmechanical models of ion gases or statistical mechanical models ofparameters that affect the time-of-flight. In many cases, the quantityof material in the extraction region is in the pico-molar range (10⁻¹²moles is on the order of 10¹¹ particles) and hence statistics arereliable. An issue is the timescale for the system to reach equilibrium.In some embodiments, equilibrium statistical mechanics can apply if thesystem converges to equilibrium faster than, e.g. the microsecond range.

Model of Species Distributed in Position

Some embodiments have a parametric model of the initial positiondistribution and with a fixed initial energy. The time-of-flightdistribution to be observed can be modeled. Let S be a normal randomvariable with mean s₀ and variance σ_(o) ²<<s₀. In the followingcalculations, the distribution of the time-of-flight in the field-freeregion (t_(D)) is modeled rather than the total time-of-flight(t_(tot)). Other embodiments can model the total time-of-flight, or inthe field regions such as constant field regions.

From (2) the time-of-flight can be a random variable t_(D)(S) and whatwill be observed in the mass spectrum is the probability densityfunction of t_(D)(S). The peak shape is the density of the push-forwardof N(s₀, σ_(o) ²) measured under the map t_(D): R→R. From probabilitytheory, if U=h(X) and h(x) is either increasing or decreasing, then theprobability density functions p_(U)(u) and p_(U)(u)=p_(S)(s) are relatedby

$\begin{matrix}{{p_{U}(u)} = {{p_{S}\left( {h^{- 1}(u)} \right)}{\frac{\mathbb{d}\left( {h^{- 1}(u)} \right)}{\mathbb{d}u}}}} & (4)\end{matrix}$

In some embodiments, this can be a strictly decreasing function; otherembodiments have an increasing function. To simplify notation, lett_(D)=ψ and Z=ψ(S). A constant is defined:

$K = {D{\sqrt{\frac{m}{2\; E_{o}q}}.}}$

From above, the probability density functions P_(Z)(z) and p_(S)(s) arerelated by

${p_{z}(z)} = {{p_{S}\left( {\psi^{- 1}(z)} \right)}{\frac{\mathbb{d}\left( {\psi^{- 1}(z)} \right)}{\mathbb{d}z}}}$

Solving for ψ⁻¹(z) and

$\frac{\mathbb{d}\left( {\psi^{- 1}(z)} \right)}{\mathbb{d}z}$gives

${\psi^{- 1}(z)} = {{\frac{K^{2}}{z^{2}}\mspace{14mu}{and}\mspace{14mu}\frac{\mathbb{d}\left( {\psi^{- 1}(z)} \right)}{\mathbb{d}z}} = {\frac{{- 2}\; K^{2}}{z^{3}}.}}$

In embodiments where the probability density function p_(S)(s) isgaussian then:

${p_{s}(s)} = {\frac{1}{\sqrt{2\pi}\sigma_{0}}{\exp\left\lbrack {- \frac{\left( {s - s_{0}} \right)^{2}}{2\sigma_{o}^{2}}} \right\rbrack}}$

which gives

$\begin{matrix}{{{p_{z}(z)} = {\frac{1}{\sqrt{2\pi}\sigma_{0}}{\frac{{- 2}K^{2}}{z^{3}}}{\exp\left\lbrack {\left( \frac{- 1}{2\sigma_{o}^{2}} \right)\left( {\frac{K^{2}}{z^{2}} - s_{o}} \right)^{2}} \right\rbrack}}},} \\{for} \\{\frac{K}{\sqrt{2s_{o}}} \leq z < \infty}\end{matrix}$

and has a maximum

$z = {\frac{K}{\sqrt{s_{o}}} = {D{\sqrt{\frac{m}{2E_{o}s_{o}q}}.}}}$

By pushing forward a gaussian distribution for the spatial distribution,a skewed gaussian for t_(D)(s) is obtained.

FIG. 5 shows a probability density function p_(Z)(z) of ions withm/z=2000 and a gaussian spatial distribution N(s₀, σ_(o) ²) whereσ_(o)=s_(o). A clear skew to the right is shown.

Thus, is possible to calculate and/or at least analytically approximatethe probability density function of time-of-flight as a function ofrandom variables representing the initial position and/or energydistributions. Some embodiments model simple analyzer configurationssuch as a single extraction region with a field and a field-free region.Other embodiments model more complicated analyzer configurations.

Model of Species Distributed in Energy

In some embodiments, the initial position is constant but the initialkinetic energy in the flight axis-direction has a gaussian distribution.

In one case, the initial distribution can be given by a N(U₀, σ₀ ²)random variable U. The time-of-flight in the drift region is given by

$\begin{matrix}{{{t_{D}(u)} = {{\psi(u)} = \frac{D\sqrt{2m}}{2\sqrt{U + K}}}},} \\{where} \\{K = {{qE}_{0}{s_{0}.}}} \\{Then} \\{{{\psi^{- 1}(t)} = {\frac{m\; D^{2}}{2t^{2}} - K}},} \\{and} \\{{\frac{\mathbb{d}\;}{\mathbb{d}t}{\psi^{- 1}(t)}} = {- {\frac{m\; D^{2}}{t^{3}}.}}}\end{matrix}$

The probability distribution of the time-of-flight Z=ψ(U) is

$\begin{matrix}{{p_{z}(z)} = {\frac{1}{\sqrt{2\pi}\sigma_{0}}\frac{m\; D^{2}}{z^{3}}{{\exp\left( {{- \frac{1}{2\sigma_{0}^{2}}}\left\{ {\frac{m\; D^{2}}{2\; z^{2}} - K - U_{0}} \right\}^{2}} \right)}.}}} & (5)\end{matrix}$

Another Model of Species Distributed in Position

If y denotes the initial distance of an ion from the beginning of thefield-free region (0≦y≦S), and

$K = \frac{2q\; e\; E_{0}}{m}$

where

e is the charge of an electron in Coulombs

q is the integer charge of the ion

m is the mass of the ion

E₀ is the electric field strength of the extraction region

then the time-of-flight ist _(tof) =t _(ext) +t _(D)  (6)

where t_(tof) is the time-of-flight, t_(ext) is the time the ion spendsin the extraction chamber, and t_(D) is the time the ion spends in thefield-free region. We can show that:

$\begin{matrix}{t_{D} = \frac{D}{\sqrt{Ky}}} \\{and} \\{t_{ext} = {{\int_{0}^{y}\frac{\mathbb{d}s}{v(s)}}\  = \frac{2\sqrt{y}}{\sqrt{K}}}}\end{matrix}$

Combining the above two terms gives t_(tof):

$\begin{matrix}{t_{tof} = {\frac{1}{\sqrt{Ky}}\left( {{2y} + D} \right)}} & (7)\end{matrix}$

We suppose that the random variable Y, representing initial position isdistributed asY˜N(v, τ ²).

If t_(tof)=F(y), then we need to find y=F⁻¹(t). To this end, equation 7can be rewritten as:√{square root over (Kyt)}=2y+D

Substituting z²=y, gives:2z ² −√{square root over (Kt)}z+D=04z=−√{square root over (Kt)}±√{square root over (Kt ²−8D)}16z ²=2Kt ²−8D∓2√{square root over (Kt)}√{square root over (Kt ²−8D)}

Substituting back in y

$\begin{matrix}{y = \frac{{2K\; t^{2}} - {{8D} \mp {2\sqrt{K}t\sqrt{{Kt}^{2} - {8D}}}}}{16}} & (8)\end{matrix}$

Of these two solutions, for physical reasons, the solution with theminus sign can be chosen.

Let ψ(t)=F⁻¹(t) and find the derivative with respect to t

$\begin{matrix}\begin{matrix}{{4\frac{\mathbb{d}{\psi(t)}}{\mathbb{d}t}} = {{Kt} - \frac{{K^{2}t^{2}} - {4{DK}}}{\sqrt{{K^{2}t^{2}} - {8{KD}}}}}} \\{{4\frac{\mathbb{d}{\psi(t)}}{\mathbb{d}t}} = {{Kt} - \frac{{K^{2}t^{2}} - {4{DK}}}{\sqrt{{K^{2}t^{2}} - {8{KD}}}}}}\end{matrix} & (9)\end{matrix}$

From equations 8 and 9, the push forward can be calculated as

$\begin{matrix}{{p_{T}(t)} = {\frac{{\psi^{\prime}(t)}}{\tau\sqrt{2\;\pi}}{\exp\left( {- \frac{\left( {{\psi(t)} - v} \right)^{2}}{2\tau^{2}}} \right)}}} & (10)\end{matrix}$

Another Model of Species Distributed in Energy

The push forward for the case with an initial energy distribution can becalculated. Suppose that the random variable X, representing initialvelocity, is distributed as

$\begin{matrix}{X \sim {N\left( {\mu,\sigma^{2}} \right)}} \\{t_{D} = \frac{D}{\sqrt{x^{2} + {KS}}}} \\{t_{ext} = {\frac{2}{K}{\left( {\sqrt{x^{2} - {KS}} - x} \right).}}}\end{matrix}$

Combining these terms gives an expression for t_(tof):

$\begin{matrix}{t_{tof} = {\frac{D}{\sqrt{x^{2} + {KS}}} + {\frac{2}{K}\left( {\sqrt{x^{2} + {KS}} - x} \right)}}} & (6)\end{matrix}$

Substituting u=√{square root over (x²+KS)}:

${{2u} + \frac{KD}{u} - {2\sqrt{u^{2} - {KS}}} - {Kt}} = 0$

This can be written as a polynomial in u power 3.4tu ³−(4s+4D+Kt ²)u ²+2KDtu−KD ²=0

Solving for u and letting A=4(D+S) gives:

$\begin{matrix}{{\frac{1}{12t}\left( {A + {Kt}^{2} + \frac{A^{2} + {2\left( {A + {12D}} \right){Kt}^{2}} + {K^{2}t^{4}}}{{f(t)}^{1/3}} + {f(t)}^{1/3}} \right)},} \\{{f(t)} = {A^{3} + {3\left( {A^{2} + {12{AD}} - {72D^{2}}} \right){Kt}^{2}} + {3\left( {A + {12D}} \right)K^{2}t^{4}} + {K^{3}t^{6}} +}} \\{12\sqrt{3}\sqrt{D^{2}{{Kt}^{2}\left( {{- A^{3}} - {4\left( {A^{2} + {9{AD}} - {27D^{2}}} \right){Kt}^{2}} - {\left( {{5A} + {68D}} \right)K^{2}t^{4}} - {2K^{3}t^{6}}} \right)}}}\end{matrix}$

Now with ψ(t), ψ′(t) can also be calculated:

$\begin{matrix}{{\psi^{\prime}(t)} = {\frac{1}{12}{t\left( {{2{Kt}} + \frac{{4\left( {A + {12D}} \right){Kt}} + {4K^{2}t^{3}}}{{f(t)}^{1/3}} +} \right.}}} \\\left. {\frac{1}{12}\left( {A + {f(t)}^{1/3} + {Kt}^{2} + \frac{A^{2} + {2\left( {A + {12D}} \right){Kt}^{2}} + {K^{2}t^{4}}}{{f(t)}^{1/3}}} \right)} \right)\end{matrix}$

Model of Combined Position and Energy

If ν is the velocity at the start of the field-free region, then thetime-of-flight in the field-free region is given by

$\begin{matrix}{t_{D} = \frac{D}{v}} \\{{and}\mspace{14mu}{the}\mspace{14mu}{inverse}\mspace{14mu}{by}} \\{{\psi(t)} = {- \frac{D}{t}}} \\{{with}\mspace{14mu}{derivative}} \\{{\psi^{\prime}(t)} = {- {\frac{D}{t^{2}}.}}}\end{matrix}$

If p_(V)(ν) is the distribution of velocities at the start of thefield-free region, then the corresponding time-of-flight distribution is

${p_{T}(t)} = {\frac{D}{t^{2}}{p_{v}\left( \frac{D}{t} \right)}}$

General mass spectrometer analyzer configurations with an arbitrarynumber of electric field regions and field-free regions

Equations for calculating the time-of-flight of an ion through anysystem involving uniform electric fields can be derived from the laws ofbasic physics. Such equations can accurately determine the flight timeas a function of the mass-to-charge ratio for any specific instrument,with distances, voltages and initial conditions. The accuracy of suchcalculations can be limited by uncertainties in the precise values ofthe input parameters and by the extent to which the simplifiedone-dimensional model accurately represents the real three-dimensionalinstrument. Other embodiments can use more than one-dimension, such as atwo-dimensional, or a three-dimensional model.

Analyzers with electric fields can have at least two kinds of regions:field free regions, and constant field regions. Velocities of an ion canbe traced at different regions to understand the time-of-flight. In anideal field-free region of length L, an ion's initial and finalvelocities are the same and therefore the time spent in the region ist _(Free) =L/ν _(final) =L/ν _(initial)

In other embodiments that have nonideal field-free regions with changesin velocity in the field-free region, decelerations and/or accelerationscan be accounted for in the time spent in the field-free region.

In a simple constant electric field region, the velocity changes but theacceleration is constant. Using this information, supposing theacceleration (that depends on mass) is a in a region of length L, thetime of flight ist _(ConstantField)=ν_(final) /a−V _(initial) /a.

In other embodiments that have nonideal constant electric field regionswith nonconstant acceleration, deviations from constant acceleration canbe accounted for in the time spent in the constant field region.

A general formula for total time-of-flight through regions withaccelerations a₁, . . . , a_(M) is given by

$t = {\sum\limits_{k = 1}^{M}\; t_{k}}$

where

$t_{k} = \left\{ \begin{matrix}{{v_{k}/a_{k}} - {v_{k - 1}/a}} \\{L_{k}/v_{k - 1}}\end{matrix} \right.$

The connection between ν_(k−1) and ν_(k) is given by conservation ofenergy.

${v_{k}^{2} - v_{k - 1}^{2}} = \left\{ \begin{matrix}0 \\{2a_{k}{L_{k}.}}\end{matrix} \right.$

As a step towards simplification, note that

${\frac{v_{k}}{a_{k}} - \frac{v_{k - 1}}{a_{k}}} = {{\frac{1}{a_{k}}\left( {v_{k} - v_{k - 1}} \right)}\mspace{115mu} = {{\frac{1}{a_{k}}\frac{v_{k}^{2} - v_{k - 1}^{2}}{v_{k} + v_{k - 1}}}\mspace{115mu} = {{\frac{1}{a_{k}}\frac{2a_{k}L_{k}}{v_{k} + v_{k - 1}}}\mspace{115mu} = {\frac{2L_{k}}{v_{k} + v_{k - 1}}.}}}}$

This leads to a unified formula for total time-of-flight:

$t = {\sum\limits_{k = 1}^{M}\frac{2L_{k}}{v_{k} + v_{k - 1}}}$

Next, a simple inductive argument shows

$v_{k}^{2} = {{\sum\limits_{j = 1}^{k}{2a_{j}L_{j}}} + {v_{0}^{2}.}}$

Letting

${P_{k} = {\sum\limits_{j = 1}^{k}{2a_{j}L_{j}}}},$we rewrite the time-of-flight formula as

$\begin{matrix}{t = {\sum\limits_{k = 1}^{M}{\frac{2L_{k}}{\sqrt{P_{k} + v_{0}^{2}} + \sqrt{P_{k - 1} + v_{0}^{2}}}.}}} & (6)\end{matrix}$

If we collect the initial conditions s_(o) and ν_(o) in one termI(s ₀,ν₀)=a ₁ s ₀+ν₀ ²,

then it is clear that we have nonnegative constants Q₁, . . . , Q_(M)such that

$t = {{\psi(I)} = {\sum\limits_{k = 1}^{M}{\frac{1}{\sqrt{Q_{k} + I} + \sqrt{Q_{k - 1} + I}}.}}}$

Taking a derivative shows that this is a strictly decreasing functionfor I>0 and therefore has an inverse. The derivative of the inverse ofthis function is of interest, according to (4) such a term affects thepushforward density as a factor, and hence has a strong impact on theshape of the push-forward distribution.

Next is introduced a procedure for calculating the inverse ψ⁻¹(t) ofψ(I). It can be observed that if√{square root over (x+a)}−√{square root over (x)}=z

then

$x = {\left( \frac{a - z^{2}}{2z} \right)^{2}.}$

If any of the t₁, . . . , t_(M), is known, then it would be easy tocalculate I. In one approach, these t_(k) can be backed out of in stagesuntil t is exhausted. The system of quadratic equations includes thefollowing: for each 1≦k≦M:

${{\left( \frac{{a_{k}L_{k}} - t_{k}^{2}}{2t_{k}} \right)^{2} - Q_{k}} = I},$

with the constraint that the t_(k) sum to t.

Linshapes of a Single-stage Reflectron Mass Spectrometer

Some embodiments can be applied to a mass spectrometer including threechambers and a detector—a ion extraction chamber (e.g. rectangular), afield-free drift tube, and a reflectron. The shape of the distributionof the time-of-flight of a single mass-to-charge species can bedetermined at least partly by the distributions of initial positions inthe extraction chamber and/or the initial velocities along theflight-axis.

Approximate formulae can be derived for the time-of-flight distributionfor a species of fixed mass-to-charge ratio, in this example assumingthat the distributions for initial positions and velocities aregaussian. The initial positions have restricted range, and theassumption for initial position may be modified to reflect this.

The plane that separates the extraction region from the field-free driftregion can be called the “drift start” plane. For a given ion theflight-axis velocity at the “drift start” plane can be referred to asthe “drift start velocity.”

Basic Formulae

If x denotes the initial velocity and y denotes the initial distance ofan ion from the drift-start plane (0≦y≦S), and

$K = \frac{2{qeE}_{0}}{m}$

where

e is the charge of an electron in Coulombs

q is the integer charge of the ion

m is the mass of the ion

E_(o) is the electric field strength of the extraction region thenν(x, y)=√{square root over (x ² +Ky)}.

If an ion has drift-start velocity of ν and if

L₁ is the length of the drift region

L₂ is the distance from the drift-end plane and the detectorD=L ₁ +L ₂

E₁ is the electric field strength of the reflectron, and

a=qeE₁/m is the acceleration of the ion in the reflectron

then the time-of-flight of the ion is

${T(v)} = {\frac{D}{v} + {2{\frac{v}{a}.}}}$

Given a distribution p_(XY) in the (x, y)—space of initial velocitiesand positions, the probability density can be determined that resultswhen this distribution is pushed forward by(x, y)→ν(x, y).

The resulting density in the space of velocities can be denoted byp_(V). Next, T can be used to push forward the density p_(V) to a newdensity in the t-spacep _(T) =T*p _(V).

Expression for p_(V) in the Gaussian Case

Suppose that the random variable X, representing initial velocity, andY, representing initial position, are distributed asX˜N(μ, σ²)Y˜N(ν, τ²)

The push-forward of p_(XY) underν(x, y)=√{square root over (x ² +Ky)}

can be given by integrating the measure p_(XY) (x, y)dxdy over thefibersFiber(ν)={(x, y):√{square root over (x ² +Ky)}=ν}.

Suppose F(x, y) is any function of x and y. Then

E_(XY)[F] = ∫_(x) ∫_(y) F(x, y)p_(XY)(x, y)𝕕x𝕕y.

Change the variables to z=√{square root over (Ky)}. Then

${dz} = {{\frac{\sqrt{K}}{2\sqrt{y}}{dy}} = {{\frac{K}{2\sqrt{Ky}}{dy}} = {\frac{K}{2z}{{dy}.}}}}$

Therefore,

${\frac{2z}{K}{dz}} = {{dy}.}$

So

${E_{XY}\lbrack F\rbrack} = {\int_{x}^{\;}{\int_{z = 0}^{z = \sqrt{KS}}{{F\left( {x,\frac{z^{2}}{K}} \right)}{p_{XY}\left( {x,\frac{z^{2}}{K}} \right)}\frac{2}{K}z{\mathbb{d}z}{{\mathbb{d}x}.}}}}$

Now change to polar coordinates (ν,θ). Care can be taken with the rangesof θ: when ν≦√{square root over (KS)} the range of θ is [−π/2,π/2];however, when ν>√{square root over (KS)} the range can be broken intotwo symmetric parts that consist of [arccos(√{square root over(KS)}/ν),π/2] and its mirror image. Refer to FIG. 6.

Next, change to polar coordinates z=ν cos θ and x=ν sin θ withoutspecifying the limits of θ to get

$\begin{matrix}{\quad{{E_{XY}\lbrack F\rbrack} = {\int_{v}^{\;}{\int_{\theta}^{\;}{{F\left( {{v\;\sin\;\theta},{\frac{v^{2}}{K}\cos^{2}\theta}} \right)}{p_{XY}\left( {{v\;\sin\;\theta},{\frac{v^{2}}{K}\cos^{2}\theta}} \right)}\frac{2v}{k}\cos\;\theta\; v{\mathbb{d}\theta}{\mathbb{d}v}}}}}} \\{{= {\int_{v}^{\;}{\frac{2v^{2}}{K}\left( {\int_{\theta}^{\;}{{F\left( {{v\;\sin\;\theta},{\frac{v^{2}}{K}\cos^{2}\;\theta}} \right)}{p_{XY}\left( {{v\;\sin\;\theta},{\frac{v^{2}}{K}\cos^{2}\;\theta}} \right)}\cos\;\theta{\mathbb{d}\theta}}} \right){\mathbb{d}v}}}}\quad}\end{matrix}$

Make the change of variables u=ν sin θ so that the inner integral abovebecomes

$\frac{2}{K}{\int_{0}^{v}{{F\left( {u,\frac{\left( {v^{2} - u^{2}} \right)}{K}} \right)}{p_{XY}\left( {u,\frac{\left( {v^{2} - u^{2}} \right)}{K}} \right)}{\mathbb{d}u}}}$

An expression for p_(V) for ν≦√{square root over (KS)} can be given by

${{p_{v}(v)} = {\frac{4v}{K}{\int_{0}^{v}{{p_{XY}\left( {u,\frac{v^{2} - u^{2}}{K}} \right)}{\mathbb{d}u}}}}};$

and for ν≧√{square root over (KS)}, the range of θ is [arccos(√{squareroot over (KS)}/ν),π/2] and change of

variables to u yields the range [√{square root over (ν²−KS)},ν] as clearfrom FIG. 6:

${p_{v}(v)} = {\frac{4v}{K}{\int_{\sqrt{v^{2} - {KS}}}^{v}{{p_{XY}\left( {u,\frac{v^{2} - u^{2}}{K}} \right)}{{\mathbb{d}u}.}}}}$

Upper and lower bounds can be explored that lead to an approximationthat has accurate decay as ν→∞.

Approximation of Taylor expansion

${p_{v}(v)} = \left\{ \begin{matrix}{\frac{4v}{2\;\pi\;\sigma\; K\;\tau}{\int_{0}^{v}{{{\mathbb{e}}\left( {u,v} \right)}{\mathbb{d}u}}}} & {v \leq \sqrt{Ks}} \\{\frac{4v}{2\;\pi\;\sigma\; K\;\tau}{\int_{\sqrt{v^{2} - {Ks}}}^{v}{{{\mathbb{e}}\left( {u,v} \right)}{\mathbb{d}u}}}} & {\sqrt{Ks} \leq v < \infty}\end{matrix} \right.$

where

$\begin{matrix}{{e\left( {u,v} \right)} = {\exp\left\{ {{- \frac{u^{2}}{2\sigma^{2}}} - {\frac{1}{2\tau^{2}}\left( {\frac{v^{2} - u^{2}}{K} - v} \right)^{2}}} \right\}}} \\{= {\exp\left\{ {{- \frac{u^{2}}{2\sigma^{2}}} - {\frac{1}{2\tau^{2}K^{2}}\left( {v^{2} - u^{2} - {Kv}} \right)^{2}}} \right\}}} \\{= {\exp\left\{ {{- \frac{u^{2}}{2\sigma^{2}}} - {\frac{1}{2\tau^{2}K^{2}}\left( {u^{2} - v^{2} + {Kv}} \right)^{2}}} \right\}}} \\{= {\exp\left\{ {- {\frac{1}{2\tau^{2}K^{2}}\left\lbrack {{u^{2}\frac{\tau^{2}K^{2}}{\sigma^{2}}} + \left( {u^{2} - v^{2} + {Kv}} \right)^{2}} \right\rbrack}} \right\}}} \\{= {\exp\left\{ {- {\frac{1}{2\tau^{2}K^{2}}\left\lbrack {\left( {{v^{2}\frac{\tau^{2}K^{2}}{\sigma^{2}}} - {{Kv}\frac{\tau^{2}K^{2}}{\sigma^{2}}}} \right) + \mspace{34mu}\left( {{\frac{\tau^{2}K^{2}}{\sigma^{2}}\left( {u^{2} - v^{2} + {Kv}} \right)} + \left( {u^{2} - v^{2} + {Kv}} \right)^{2}} \right)} \right\rbrack}} \right\}}} \\{= {\exp\left\{ {{- \frac{v^{2}}{2\sigma^{2}}} + \frac{Kv}{2\sigma^{2}} + \frac{\tau^{2}K^{2}}{8\sigma^{4}}} \right\}\exp\left\{ {{- \frac{1}{2}}\left( {\frac{u^{2}}{\tau\; K} - \frac{v^{2}}{\tau\; K} + \frac{\tau\; K}{2\sigma^{2}} + \frac{v}{\tau}} \right)^{2}} \right\}}}\end{matrix}$

Let

$\alpha = {\frac{v^{2}}{\tau\; K} - \frac{\tau\; K}{2\sigma^{2}} - \frac{v}{\tau}}$

and

${A(v)} = {\exp\left( {{- \frac{v^{2}}{2\sigma^{2}}} + \frac{Kv}{2\sigma^{2}} + \frac{\tau^{2}K^{2}}{8\sigma^{4}}} \right)}$${p_{v}(v)} = \left\{ \begin{matrix}{\frac{4v}{2{\pi\sigma}\; K\;\tau}{A(v)}{\int_{0}^{v}{\exp\left\{ {{- \frac{1}{2}}\left( {\frac{u^{2}}{\tau\; K} - \alpha} \right)^{2}} \right\}{\mathbb{d}u}}}} & {v \leq \sqrt{Ks}} \\{\frac{4v}{2{\pi\sigma}\; K\;\tau}{A(v)}{\int_{\sqrt{v^{2} - {Ks}}}^{v}{\exp\left\{ {{- \frac{1}{2}}\left( {\frac{u^{2}}{\tau\; K} - \alpha} \right)^{2}} \right\}{\mathbb{d}u}}}} & {\sqrt{Ks} \leq v < \infty}\end{matrix} \right.$

This last integral can be simplified using Taylor expansion. In thisexample, a five term expansion is used. Let

${G(x)} = {x{\int_{0}^{x}{{\exp\left( {{- \frac{1}{2}}\left( {u^{2} - x^{2} - a} \right)^{2}} \right)}{\mathbb{d}u}}}}$

Then

${x\;{G(x)}} = {{{\mathbb{e}}^{{- \frac{1}{2}}a^{2}}\left( {x^{2} - {\frac{2}{3}{ax}^{3}} + {\frac{{16a^{4}} + {32a^{2}} - 32}{120}x^{6}}} \right)}.}$

Note that

${{A(v)}{\mathbb{e}}^{{- \frac{1}{2}}a^{2}}} = {{\exp\left( {{- \frac{v^{2}}{2\sigma^{2}}} - \frac{v^{2}}{2\tau^{2}}} \right)}.}$

Fitting Modeled Lineshapes to Empirically Observed Data

The mathematical forms derived above for the lineshapes, or shapes ofpeaks, of the different species based upon the underlying physics of themass spectrometer, can be applied to the analysis of spectra. Rigorousfits can be performed between empirical mass spectra and synthetic massspectra generated from mixtures of lineshapes.

A more complex method for fitting a mass spectrum using modeledlineshape equations uses model basis vectors, such as wavelets and/orvaguelettes. This can be done generally, and/or for a given massspectrometer design. A basis set is a set of vectors (or sub-spectra),the combination of which can be used to model an observed spectrum. Anexpansion of the lineshape equations can derive a basis set that is veryspecific for a given mass spectrometer design.

A spectrum can be described using the basis vectors. An observedempirical spectrum can be described by a weighted sum of basis vectors,where each basis vector is weighted by multiplication by a coefficient.

Some embodiments use scaling. The linewidth of the peak corresponding toa species in a mass spectrum is dependent on the time-of-flight of thespecies. Thus, the linewidth in a mass spectrum may not be constant forall species. One way to address this is to rescale the spectrum suchthat the linewidths in the scaled spectrum are constant. Such a methodcan utilize the linewidth as a function of time-of-flight. This can bedetermined and/or be estimated analytically, empirically, and/or bysimulation. Spectra with constant linewidth can be suitable for manysignal processing techniques which may not apply to non-constantlinewidth spectra.

Some embodiments use linear combinations and/or matched filtering. Inone embodiment, a weighted sum of lineshape functions representing peaksof different species can be fitted to the observed signal by minimizingerror. The post-processed data can include the resulting vector ofweights, which can represent the abundance of species in the observedmass spectrum.

Fitting can assume that the spectrum has a fixed set of lineshapecenters (including mass-to-charge values) c₁, c₂, . . . , c_(N) and apredetermined set of widths for each center σ₁, σ₂, . . . , σ_(N). Alineshape function such as λ(c, σ, t) may be determined for eachcenter-width pair. A synthetic spectrum may include a weighted sum ofsuch lineshape functions:

${S(t)} = {\sum\limits_{i \leq i \leq N}{w_{i}{{\lambda\left( {c_{i},\sigma_{i},t} \right)}.}}}$A minimal error fit can be performed to calculate the parameters w₁, . .. , w_(N). The error function could be the squared error, or a penalizedsquared error.

One advantage of this method is that it reduces the number of datadimensions, since an observed spectrum with a large number of datapoints can be described by a few parameters. For example, if an observedspectrum has 20,000 data points, and 20 peaks, then the spectrum can bedescribed by 60 points consisting of 20 triplets of center, width, andamplitude. The original 20,000 dimensions have been reduced to 60dimensions.

Some embodiments construct convolution operators. Lineshapes constructedanalytically, determined empirically, and/or determined by simulationmay be used to approximate a convolution operator that replaces a deltapeak (e.g., an ideal peak corresponding to the time-of-flight for aparticular species) with the corresponding lineshape.

Some embodiments use Fourier transform deconvolution. The Fouriertransform and/or numerical fast Fourier transform of a spectrum such asthe rescaled spectrum can be multiplied by a suitable function of theFourier transform of the lineshape determined analytically, estimatedempirically, and/or by simulation. The inverse Fourier transform orinverse fast Fourier transform can be applied to the resulting signal torecover a deconvolved spectrum.

Some embodiments use scaling and wavelet filtering. Any family ofwavelet bases can be chosen, and used to transform a spectrum, such as arescaled spectrum. A constant linewidth of the spectrum can be used tochoose the level of decomposition for approximation and/or thresholding.The wavelet coefficients can be used to describe the spectrum withreduced dimensions and reduced noise.

Some embodiments use blocking and wavelet filtering. The spectrum can bedivided into blocks whose sizes can be determined by linewidthsdetermined analytically, estimated empirically, and/or by simulation.Any family of wavelet bases can be chosen and used to transform aspectrum, such as the raw spectrum. Different width features can bedescribed in the wavelet coefficients at different levels. The waveletcoefficients from the appropriate decomposition levels can be used todescribe the spectrum with reduced dimensions and reduced noise.

Some embodiments construct new wavelet bases. Analytical lineshapes,empirically determined lineshapes, and/or simulated lineshapes for agiven configuration of a mass spectrometer can be used to constructfamilies of wavelets. These wavelets can then be used for filtering.

Vaguelettes are another choice for basis sets. The vaguelettes vectorscan include vaguelettes derived from wavelet vectors, vaguelettesderived from modeled lineshapes, and/or vaguelettes derived fromempirical lineshapes.

Some embodiments use wavelet-vaguelette decomposition. Another methodbased on wavelet filtering may be the wavelet-vaguelette decomposition.The modeled lineshape functions may be used to construct a convolutionoperator that replaces a delta peak with the corresponding lineshape.Any family of wavelet bases may be chosen, such as ‘db4’, ‘symmlet’,‘coiflet’. The convolution operator may be applied to the wavelet basesto construct a set of vaguelettes. A minimal error fit may be performedfor the coefficients of the vaguelettes to the observed spectrum. Theresulting coefficients may be used with the corresponding waveletvectors to produce a deconvolved spectrum that represents abundances ofspecies in the observed spectrum.

Some embodiments use thresholding estimators. Another method fordeconvolving a rescaled spectrum is the use of the mirror wavelet bases.If the observed spectrum is y=Gx+e, and if H is the pseudo-inverse of G,and if z=He, then let K be the covariance of z. The Kalifa-Mallat mirrorwavelet basis can guarantee that K is almost diagonal in that basis. Thedecomposition coefficients in this basis can be performed with, awavelet packet filter bank requiring O(N) operations. These coefficientscan be soft-thresholded with almost optimal denoising properties for thereconstructed synthetic spectra.

Fitting a basis set to an observed empirical spectrum does notnecessarily reduce the dimensionality, or the number of data pointsneeded to describe a spectrum. However, fitting the basis set “changesthe basis” and does yield coefficients (parameters) that can be filteredmore easily. If many of the coefficients of the basis vectors are closeto zero, then the new representation is sparse, and only some of the newbasis vectors contain most of the information.

In another example of filtering noise and reducing dimensionality,thresholding can be performed on the basis vector coefficients. Thesemethods remove or deemphasize the lowest amplitude coefficients, leavingintensity values for only the true signals. Hard thresholding sets aminimum cutoff value, and throws out any peaks whose height is underthat threshold; smaller peaks may be considered to be noise. Softthresholding can scale the numbers and then threshold. Multiplethresholds and/or scales can be used.

FIGS. 7 and 8 are empirical figures that show that real mass spectrahave lineshapes with a skewed shape consistent with the results of thepushed-forward lineshapes.

FIG. 7 illustrates a mass spectrum of a 3 peptide mixture of angiotensin(A), bradykinin (B), and neurotensin (N). Data were collected on anelectro-spray-ionization time-of-flight mass spectrometer (ESI-TOF MS).For each peptide, there are two peaks, one for the +2 and +3 chargestates. For example, A(+2) is the angiotensin +2 charge state.

FIG. 8 illustrates an expanded view of FIG. 7 to display in detail thebradykinin +2 charge state. The various peaks present are due todifferent isotope compositions of the bradykinin ions in the ensemble(e.g. 13C vs. 12C) By visual inspection, one can observe that thepeakshapes are skewed to the right.

Conversion between time-of-flight and mass to charge is trivial. Forexample, in some cases mass-to-charge(m/z)=2*(extraction_voltage/flight_distance²)*time-of-flight². Thus, atime-of-flight distribution can be considered an example of amass-to-charge distribution.

Some embodiments can run on a computer cluster. Networked computers thatperform CPU-intensive tasks in parallel can run many jobs in parallel.Daemons running on the computer nodes can accept jobs and notify aserver node of each node's progress. A daemon running on the server nodecan accept results from the computer nodes and keep track of theresults. A job control program can run on the server node to allow auser to submit jobs, check on their progress, and collect results. Byrunning computer jobs that operate independently, and distributingnecessary information to the computer nodes as a pre-computation, almostlinear speed is gained in computation time as a function of the numberof compute nodes used.

Other embodiments run on individual computers, supercomputers and/ornetworked computers that cooperate to a lesser or greater degree. Thecluster can be loosely parallel, more like a simple network ofindividual computers, or tightly parallel, where each computer can bededicated to the cluster.

Some embodiments can be implemented on a computer cluster or asupercomputer. A computer cluster or a supercomputer can allow quick andexhaustive sweeps of parameter spaces to determine optimal signatures ofdiseases such as cancer, and/or discover patterns in cancer.

1. A method of analyzing mass spectra comprising: determining an initialdistribution of one or more parameters of at least a first molecule;determining a theoretical modeled mass-to-charge distribution of atleast said first molecule without having said first molecule travel in amass spectrometer using said initial distribution of said one or moreparameters; and fitting said modeled mass-to-charge distribution to anempirical mass-to-charge distribution of at least said first moleculeafter it has traveled in said mass spectrometer to form a fitted modeledmass-to-charge distribution of at least said first molecule.
 2. Themethod of claim 1, wherein the fitting step includes: deriving aplurality of model basis vectors from the modeled mass-to-chargedistribution; and representing the empirical mass-to-charge distributionwith a weighted sum of the plurality of the model basis vectors.
 3. Themethod of claim 2, wherein the plurality of model basis vectors includesa wavelet vector.
 4. The method of claim 3, wherein the wavelet vectoris a standard wavelet vector.
 5. The method of claim 3, wherein thewavelet vector is a wavelet vector derived from a lineshape of themodeled mass-to-charge distribution.
 6. The method of claim 3, whereinthe wavelet vector is a wavelet vector derived from a lineshape of theempirical mass-to-charge distribution.
 7. The method of claim 2, whereinthe plurality of model basis vectors includes a vaguelette vector. 8.The method of claim 7, wherein the vaguelette vector is derived from awavelet vector.
 9. The method of claim 7, wherein the vaguelette vectorsis derived from a lineshape of the modeled mass-to-charge distribution.10. The method of claim 7, wherein the vaguelette vector is derived froma lineshape of the empirical mass-to-charge distribution.
 11. The methodof claim 2, further comprising: filtering the weighted sum of theplurality of model basis vectors.
 12. The method of claim 11, whereinsaid filtering step includes hard thresholding.
 13. The method of claim11, wherein said filtering step includes soft thresholding.
 14. Themethod of claim 1, wherein said fitting step comprises filtering thefitted modeled mass-to-charge distribution.
 15. The method of claim 14,wherein said filtering step includes hard thresholding.
 16. The methodof claim 14, wherein said filtering step includes soft thresholding. 17.The method of claim 14, wherein said filtering step includes filteringwith a filter bank.
 18. The method of claim 14, wherein said filteringstep utilizes a wavelet basis vector or a vaguelette basis vector. 19.The method of claim 1, wherein the fitting step includes an errorfunction.
 20. The method of claim 19, wherein the error function is asquared error function or a penalized squared error function.
 21. Themethod of claim 1, wherein the fitted modeled mass-to-chargedistribution is used for pattern recognition.
 22. The method of claim21, wherein said pattern recognition is used for finding one or moreproteins indicative of one or more diseases.
 23. The method of claim 1wherein said one or more parameters affect time-of-flight of said firstmolecule.
 24. The method of claim 23 wherein said one or more parametersis selected from the group consisting of: initial position, initialenergy, ionization, position focusing, extraction source shape, fringeeffects of electric field, statistical mechanics of ion gasses, andelectronic. hardware artifacts.
 25. The method of claim 1 wherein saidinitial distribution of said one or more parameters is represented by aGaussian distribution.
 26. The method of claim 1 wherein saiddetermining a modeled mass-to-charge distribution step utilizes atime-of-flight function.
 27. The method of claim 1 wherein said fittingstep involves scaling said modeled mass-to-charge distribution or saidempirical mass-to-charge distribution to generate constant lineshapewidths.
 28. The method of claim 1 wherein said mass spectrometer is atime-of-flight mass spectrometer.
 29. The method of claim 1 wherein saidfitted modeled mass-to-charge distribution has reduced noise as comparedto said empirical mass-to-charge distribution.
 30. The method of claim 1wherein said fitted modeled mass-to-charge distribution has compresseddata as compared to said empirical mass-to-charge distribution.
 31. Themethod of claim 1 wherein said fitted modeled mass-to-chargedistribution includes recovered data as compared to said empiricalmass-to-charge distribution.
 32. The method of claim 1 wherein saidfitted modeled mass-to-charge distribution has reduced dimensionality ascompared to said empirical mass-to-charge distribution.
 33. the methodof claim 1 wherein said determining an initial distribution occurs priorto said first molecule traveling through said mass spectrometer.
 34. Amethod of analyzing mass spectra comprising: determining an initialdistribution of one or more parameters of at least a first molecule;determining a modeled mass-to-charge distribution of at least said firstmolecule using said initial distribution of said one or more parameters;fitting said modeled mass-to-charge distribution to an empiricalmass-to-charge distribution of at least said first molecule after it hastraveled in a mass spectrometer to form a fitted modeled mass-to-chargedistribution of at least said first molecule, wherein said fifing stepincludes: deriving a plurality of model basis vectors from the modeledmass-to-charge distribution; and representing the empiricalmass-to-charge distribution with a weighted sum of said plurality ofmodel basis vectors, wherein said plurality of model basis vectorsincludes a wavelet vector derived from a lineshape of said modeledmass-to-charge distribution.
 35. A method of analyzing mass spectracomprising: determining an initial distribution of one or moreparameters of at least a first molecule; determining a modeledmass-to-charge distribution of at least said first molecule using saidinitial distribution of said one or more parameters; fitting saidmodeled mass-to-charge distribution to an empirical mass-to-chargedistribution of at least said first molecule after it has traveled in amass spectrometer to form a fitted modeled mass-to-charge distributionof at least said first molecule, wherein said fitting step includes:deriving a plurality of model basis vectors from the modeledmass-to-charge distribution; and representing the empiricalmass-to-charge distribution with a weighted sum of said plurality ofmodel basis vectors, wherein said plurality of model basis vectorsincludes a wavelet vector derived from a lineshape of said empiricalmass-to-charge distribution.
 36. A method of analyzing mass spectracomprising: determining an initial distribution of one or moreparameters of at least a first molecule; determining a modeledmass-to-charge distribution of at least said first molecule using saidinitial distribution of said one or more parameters; fitting saidmodeled mass-to-charge distribution to an empirical mass-to-chargedistribution of at least said first molecule after it has traveled in amass spectrometer to form a fitted modeled mass-to-charge distributionof at least said first molecule, wherein said fitting step includes:deriving a plurality of model basis vectors from the modeledmass-to-charge distribution; and representing the empiricalmass-to-charge distribution with a weighted sum of said plurality ofmodel basis vectors, wherein said plurality of model basis vectorsincludes a vaguelette vector derived from a lineshape of said modeledmass-to-charge distribution.
 37. A method of analyzing mass spectracomprising: determining an initial distribution of one or moreparameters of at least a first molecule; determining a modeledmass-to-charge distribution of at least said first molecule using saidinitial distribution of said one or more parameters; fitting saidmodeled mass-to-charge distribution to an empirical mass-to-chargedistribution of at least said first molecule after it has traveled in amass spectrometer to form a fitted modeled mass-to-charge distributionof at least said first molecule, wherein said fitting step includes:deriving a plurality of model basis vectors from the modeledmass-to-charge distribution; and representing the empiricalmass-to-charge distribution with a weighted sum of said plurality ofmodel basis vectors, wherein said plurality of model basis vectorsincludes a vaguelette vector derived from a lineshape of said empiricalmass-to-charge distribution.
 38. A method of analyzing mass spectracomprising: determining an initial distribution of one or moreparameters of at least a first molecule; determining a modeledmass-to-charge distribution of at least said first molecule using saidinitial distribution of said one or more parameters; fitting saidmodeled mass-to-charge distribution to an empirical mass-to-chargedistribution of at least said first molecule after it has traveled in amass spectrometer to form a fitted modeled mass-to-charge distributionof at least said first molecule, wherein said determining a modeledmass-to-charge distribution step involves scaling said modeledmass-to-charge distribution or said empirical mass-to-chargedistribution to generate constant lineshape widths.